4 inputs:

3 time series outputs (each with 512 timepoints):

Observations were produced by running the model at some \(\textbf{x}^*\), and adding noise sampled from a Matérn 3/2 kernel, with priors on the variance and correlation length:

There was also uncorrelated Normal iid noise added, with variance \(\sigma^2 = 10^{-6}\).

Wave 1

Sampled a 100-member LHC (grey), comparing to observations (red) and true ‘best’ model run (green - NOT KNOWN UNTIL THE END):

Alternatively, plotting residuals:

Plot basis and coefficients for e.g., flow 1:

##   Input      value Output      coeff
## 1   kMV -0.4105579     c1 -57.654972
## 2   kMV  0.9304734     c1 -20.741136
## 3   kMV  0.2210836     c1  47.747129
## 4   kMV -0.6875105     c1 -51.121585
## 5   kMV  0.2702968     c1  21.977666
## 6   kMV -0.2430006     c1  -5.173414

Emulation

Fitted emulators separately for flow1, flow2 and pressure instead of combining, just to give a bit more precision for each, and to make it easier to handle the error (all have a different sample from the covariance function, uncorrelated so may have different magnitude). May not make any difference, perhaps go back and compare later.

However, this choice allows to easily identify e.g. best runs for each, which may be useful, rather than the best runs being dominated by flow/pressure with highest magnitude/variability.

In all cases, emulation fairly accurate, and for each emulated vector, variance low relative to the spread of coefficients. May have been possible to emulate additional vectors.

Flow 1

Flow 2

Pressure

Pressure was the least variable, and only 2 vectors were required to explain \(>99%\). Added a 3rd anyway:

History matching

For each of the 3 outputs, calculated the multivariate implausibility.

For the error/discrepancy matrix, combined these into a single variance matrix (often called this \(\textbf{W} = \boldsymbol{\Sigma}_e + \boldsymbol{\Sigma}_{\eta}\)). The issue here is that this matrix needs to be fixed. If we fix the correlation length \(\ell\), i.e. so that the correlation structure of \(\boldsymbol{\Sigma}_{\eta}\) is fixed, then changes in \(\sigma^2_m\) simply scale this matrix and hence \(\textbf{W}\) (essentially ignoring the effect of \(\boldsymbol{\Sigma}_e\), as \(10^{-6} << \sigma^2_m\) in general).

This means that, although we don’t know \(\sigma^2_m\), we can obtain a lower bound on the implausibility of \(\textbf{x}\) (for a fixed \(\ell\)) by setting \(\sigma^2_m = 22.5\), i.e. at the maximum value from its prior. Any decrease in \(\sigma^2_m\) will increase the \(\mathcal{I}(\textbf{x})\) (again, with everything else still fixed), and hence if we rule out \(\textbf{x}\) with \(\sigma^2_m = 22.5\), then we would also rule it out for any lower values, and hence \(\textbf{x}\) is not possible across the full support of \(\sigma^2_m\).

As an initial exploration, let’s allow a fixed set of choices for \(\ell\), namely in the set \(\ell = \{ 0, 0.05, 0.1, \ldots, 0.85 \}\), covering the prior range. We therefore just need to construct 18 versions of \(\textbf{W}\), and then calculate the implausibility for each (for flow1, flow2 and pressure).

Given this, what percentage of parameter space is NROY?

##          l   Flow1   Flow2 Pressure     All
## 1  1.0e-10 0.91198 0.10429  0.23242 0.07443
## 2  5.0e-02 1.00000 1.00000  0.96080 0.96080
## 3  1.0e-01 1.00000 1.00000  0.98982 0.98982
## 4  1.5e-01 0.97167 0.52023  0.76542 0.49801
## 5  2.0e-01 0.00000 0.00000  0.00000 0.00000
## 6  2.5e-01 0.00000 0.00000  0.00000 0.00000
## 7  3.0e-01 0.00000 0.00000  0.00000 0.00000
## 8  3.5e-01 0.00000 0.00000  0.00000 0.00000
## 9  4.0e-01 0.00000 0.00000  0.00000 0.00000
## 10 4.5e-01 0.00000 0.00000  0.00000 0.00000
## 11 5.0e-01 0.00000 0.00000  0.00000 0.00000
## 12 5.5e-01 0.00000 0.00000  0.00000 0.00000
## 13 6.0e-01 0.00000 0.00000  0.00000 0.00000
## 14 6.5e-01 0.00000 0.00000  0.00000 0.00000
## 15 7.0e-01 0.00000 0.00000  0.00000 0.00000
## 16 7.5e-01 0.00000 0.00000  0.00000 0.00000
## 17 8.0e-01 0.00000 0.00000  0.00000 0.00000
## 18 8.5e-01 0.00000 0.00000  0.00000 0.00000

This seems like fairly strong evidence that \(\ell < 0.2\). If we increased the variance, we would get non-empty NROY spaces for higher correlations, however this is not allowed by the prior. It’s likely that the true variance is lower than 22.5, hence if we knew this these spaces would be smaller.

Now we have an issue - which NROY space is correct? Any of these \(\ell\)s could be true, and we can’t arbitrarily reduce the variance to reduce the NROY space (22.5 might be correct). If the true answer is \(\ell = 0.1\) and \(\sigma^2_m = 22.5\), all (or almost all) of parameter space is in NROY. This suggests that the variance is probably too high, but doesn’t seem reasonable to discount - certainly can be considered not implausible.

We can’t necessarily rule anything out, but let’s target the new wave on the better parts of space, so that we can then improve emulation in this region. Hence defined ‘NROY’ as any parameter setting that is in the lowest 5% of implausibilities, for any \(\ell \leq 0.2\) (adding a bit of tolerance, in case \(\ell = 0.2\) is all ruled out for e.g. poor emulation), for either flow1/flow2/pressure.

If all of these ‘best 5%s’ were unique, then ‘NROY’ would consist of 75% of space. In fact, it is 13.113% - unsurprisingly there’s overlap.

Plotting better region

What does this space look like?

Wave 2

For wave 2, sampled 100 runs from the set of best (better) runs, with this design chosen by aiming to maximise the minimum distance between points. Plotting these 100 new runs (in turqouise, over red wave 1):

We seem to have been reasonably successful in identifying runs that are more similar to the observations, whilst hopefully not over-constraining - there’s still some variety in patterns.

At wave 1, we didn’t have residuals that looked like the true residual (green, unknown), but at wave 2, we’ve successfully identified residuals similar to this:

Wave 3

Successfully zoomed in further - new runs at wave 3 are a subset of the spread at wave 2 (and it turns out, relatively consistent with the true unknown model run, green line):

Residuals at this wave still contain the ‘true’ residual (green line) - which was unknown, hence method seems to be doing quite well at finding this!

Results

Fitted emulator using the 300 runs across the 3 waves.

Plotting the 500 runs submitted, along with the values of \(\sigma^2, \ell\):

##       kMV             alpha             lrrA            lrrV      
##  Min.   :209305   Min.   :0.8747   Min.   :20.46   Min.   :21.59  
##  1st Qu.:240866   1st Qu.:0.8821   1st Qu.:29.04   1st Qu.:27.24  
##  Median :256623   Median :0.8839   Median :31.26   Median :28.31  
##  Mean   :257681   Mean   :0.8838   Mean   :31.44   Mean   :28.68  
##  3rd Qu.:271285   3rd Qu.:0.8857   3rd Qu.:33.69   3rd Qu.:30.20  
##  Max.   :298566   Max.   :0.8894   Max.   :39.61   Max.   :36.71  
##        l               var             Type          
##  Min.   :0.0900   Min.   : 4.681   Length:500        
##  1st Qu.:0.1100   1st Qu.: 8.314   Class :character  
##  Median :0.1300   Median :13.063   Mode  :character  
##  Mean   :0.1246   Mean   :12.393                     
##  3rd Qu.:0.1300   3rd Qu.:14.344                     
##  Max.   :0.1500   Max.   :22.142
##            kMV     alpha     lrrA     lrrV    l       var
## 2.5%  223697.9 0.8780456 25.11774 24.29051 0.10  6.363513
## 97.5% 294889.3 0.8880182 38.45852 33.60855 0.15 20.608741

Real history matching

I.e. what results get if use the known error (\(\ell = 0.1, \sigma^2 = 5.8\)).

Bound, and summary of implausibilities for flow1/flow2/pressure in turn:

## [1] 598.1784
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   512.5   532.7   548.8   551.1   567.2   644.4
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   533.5   684.1   770.8   787.5   877.8  1344.8
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   591.3   624.9   727.9   898.4  1034.1  3381.6

Proportion of parameter space not ruled out for flow1/flow2/pressure:

## [1] 0.98242 0.08657 0.06170

Proportion of runs that are in NROY for all 3 outputs:

## [1] 0.04457

Considering true impl across the waves, i.e. did we actually improve despite unknown error?

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   519.3   558.5   585.5   615.1   644.4   888.6

Plotting submitted 500 runs (bright green), NROY using true error parameters (blue):

##           kMV     alpha     lrrA     lrrV
## 5818 238755.3 0.8824054 29.01729 28.24244
##      kMV alpha lrrA lrrV   l var
## 1 250000 0.885   35   25 0.1 5.8

This is not a perfect comparison, as NROY here includes any runs that are within tolerance - no weighting by ‘closeness’. The 500 runs were sampled via rejection sampling based on their likelihood, hence weighted (and so runs towards edges of NROY, closer to the bound and hence less likely, not captured by these 500 runs).

Reducing bound

If we consider reducing the bound, do we get closer to truth? i.e. BC might work well?

## [1] 0.00093

Changing bound so go from 4.5% in NROY to only 0.1% gives posteriors that look great for lrrA, lrrV, but are off for alpha and kMV (which looked good before!)

Looking at best few runs vs truth: generally too high for kMV/alpha, decent for others:

##            kMV     alpha     lrrA     lrrV
## 14015 293285.2 0.8876411 38.49479 25.15424
## 23487 253490.2 0.8854755 32.21651 27.74173
## 33995 285087.4 0.8872972 37.02594 26.43998
## 48932 282562.9 0.8882096 36.49534 24.97335
## 49529 289042.6 0.8866530 39.36475 23.63969
## 84897 295526.4 0.8878309 37.97566 25.15701
## 88814 295582.3 0.8885426 36.88157 25.62080
## 98611 282052.3 0.8883698 36.92142 25.45046
##      kMV alpha lrrA lrrV   l var
## 1 250000 0.885   35   25 0.1 5.8

Possibly suggesting that issue is NOT calibration method, but fact emulator is not yet ‘perfect’ around truth.